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INTRODUCTION 

The emerging picture of the nucleus portrays a multifaceted 
environment where RNA processing events occur with accuracy, 
precision, and high resolution. Since diffusion cannot account 
for the speed and coordination of the molecular events occurring 
within its matrix, the nucleus must depend on precisely articulated 
macromolecular architectures and active transport mechanisms 
to achieve adequate throughput and signal to noise performance 
(Lanctot etal, 2007; Misteli, 2007). In addition to the execution 
of baseline processing of RNA, the extensive network of RNA 
interaction machineries must respond to incoming physiological 
signaling, such as stress and cues from the physical environment 
(McKee and Silver, 2007; Sharma and Lou, 201 1), by making rapid 
but precise changes at decision points, while at the same time 
maintaining robustness of the overall network. In effect, the entire 
nuclear space is a finely tuned RNA processing machine, designed 
to maintain accuracy in the dynamic and reversible regulation of 
myriads of transcriptome processing events simultaneously. Since 
the expansion of transcriptome processing increases the compu- 
tational plasticity (Herbert and Rich, 1999) and the information 
processing capacity of biological networks (Mattick, 2007; St. Lau- 
rent and Wahlestedt, 2007), several authors argue that biological 
complexity itself has RNA complexity at its core (Licatalosi and 
Darnell, 2010). 

Considering only current knowledge of these networks, and 
without extrapolating to as yet undiscovered regulatory intrica- 
cies, their performance already gracefully exceeds that of systems 
biology models and mechanisms. Its diversity of specific func- 
tions, and the finely tuned regulation of those functions in 
response to physiological signals, suggests the existence undis- 
covered mechanisms and network design principles at work to 
maintain robustness of the RNA output of a cell. In fact, recent 



studies of disease mechanisms suggest that humans can toler- 
ate little loss of signal to noise performance in the nucleus. 
Healthy physiological function depends on the precision, relia- 
bility, and accuracy of the nuclear RNA processing machine, as 
processing errors in RNA molecules often lead to serious diseases 
(Garcia- Blanco etal., 2004; Cooper etal., 2009; Venables etal., 
2009; Licatalosi and Darnell, 2010; Ward and Cooper, 2010; Jia 
etal., 2012). 

It is in this context that we would like to consider the genomic 
"dark matter," one of the major mysteries of the post-genome era. 
Perhaps no other topic in contemporary genomics has inspired 
such diverse viewpoints as the 95+% of the genome, previously 
known as "junk DNA," that does not code for proteins. Reports 
of pervasive transcription of these vast "dark matter" regions, 
combined with frequent identification of families of long or very 
non-coding RNAs (IncRNAs) originating from them, have opened 
new chapters of both discovery as well as controversy. The obser- 
vation that the percentage of "dark matter" genomic sequence 
correlates monotonically with organismal complexity, for every 
species sequenced to date (Taft etal., 2007), has inspired theo- 
ries proposing a central role for these regions in the information 
processing of complex organisms (Mattick, 2007; St. Laurent and 
Wahlestedt, 2007). Yet, while an increasing number of specific 
interactions between IncRNAs and other biological molecules have 
demonstrated functions for a number of dark matter transcripts 
(Wang and Chang, 20 1 1 ) , a global concept of function has not yet 
emerged. In effect, the original reports of pervasive transcription 
(Kapranov et al., 2002, 2007b; Carninci et al., 2005, 2008; Katayama 
et al, 2005) of the mammalian genome have faded somewhat, with 
focus instead on separately developed lists of IncRNAs detected in 
specific experiments or filtered by certain properties that hint at 
functionality (Willingham et al, 2005; Guttman et al, 2009; Khalil 
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etal., 2009; Wai etal, 2010; Askarian-Amiri etal, 2011; Khaitan 
et al. , 20 1 1 ) . These lists of IncRNAs usually only cover a few percent 
of the genome, representing only a small fraction of the origi- 
nal pervasiveness of dark matter and typically, sample intergenic 
space as introns of known genes are usually assumed to repre- 
sent pre-mRNAs. For example, our recent work has shown the 
presence of numerous very long transcribed regions of intergenic 
genomic space not currently covered by the lincRNA annotations 
(Kapranov etal., 2010) and has shown that introns of mouse 
genes produce stable RNAs regulated separately from the mature 
protein-coding RNAs (St. Laurent etal., submitted). Partly due 
to this uncertainty, some authors have cast doubt on the impor- 
tance of dark matter transcripts, labeling them transcriptional 
noise (Brosius, 2005; Struhl, 2007; van Bakel and Hughes, 2009; 
Robinson, 2010; van Bakel et al, 2010), or even arguing that they 
largely represent "fragments of known pre-mRNAs" (van Bakel 
etal, 2010). Even the existence of much of the dark matter RNA 
implied by the early reports of pervasive transcription has stirred 
recent controversy (van Bakel et al., 2010). On balance, a common 
view in the field holds that while there is a collection of IncRNAs 
with specific interactions and functions, they exist among a larger 
collection of dark matter transcriptional noise. 

As the controversy surrounding the function of "dark matter" 
RNA continues, a number of recent studies provided more com- 
prehensive datasets, through the implementation of improved 
methodologies to confirm its existence and, more importantly, 
to measure its relative mass. A recent investigation designed to 
capture and measure non-exonic signals, revealed surprisingly 
that dark matter RNAs actually comprise a majority of non- 
ribosomal non-mitochondrial RNAs in human cells (Kapranov 
et al, 2010). We also know that the nucleus is rich in dark matter 
RNA (Cheng etal., 2005). Since the majority of protein-coding 
RNAs reside in the cytoplasm, the fraction of dark matter RNA is 
likely to be many folds higher in the nucleus than that of protein- 
coding RNAs. 

Considering the vital importance of maintaining the perfor- 
mance of nuclear processing of all types, the nuclear molecular 
machineries would not tolerate the accumulation of large amounts 
of non-functional RNA molecules. Any significant population of 
such molecules would at best represent a large input of noise into 
the fine-tuned computational machinery of nuclear processing, 
not likely to benefit the performance of the nucleus or the cell as 
a unit. In practical terms, if dark matter had no biological func- 
tion, the high performance and signal to noise ratios of the nuclear 
RNA processing machineries would logically conflict with the high 
levels of dark matter now documented in human cells (Kapranov 
etal, 2010). In other words, the currently emerging picture of 
the nucleus contains a paradox: a nuclear micro-environment 
simultaneously populated by high concentrations of precision 
RNA processing machineries, and by an astonishing level of noise 
from dark matter RNA. How can the nucleus precisely regulate 
such highly accurate processing events in tens of thousands of 
transcripts simultaneously, while ignoring the massive amount 
of inherent noise from dark matter RNA existing in the same 
nuclear space? 

To resolve this apparent paradox, and to provide a mecha- 
nism for global function of dark matter RNA, in this article we 



present a theory in which dark matter RNA plays a role in the 
generation of a landscape of spatial micro-domains coupled to 
the information signaling matrix of the nuclear landscape. Within 
and between these micro-domains, dark matter RNAs addition- 
ally function to tether RNA interacting proteins and complexes of 
many different types, and by doing so, allow for a higher perfor- 
mance of the various processes requiring them at ultra-fast rates. 
This improves signal to noise characteristics of RNA processing, 
trafficking, and epigenetic signaling, where competition and dif- 
ferential RNA binding among proteins drives the computational 
decisions inherent in regulatory events. 

THE SYSTEM WIDE PERFORMANCE CHARACTERISTICS 
OF NUCLEAR RNA PROCESSING MACHINERIES 

It is estimated that an average human cells contains 300,000 
mRNAs (Hastie and Bishop, 1976), each containing on average 
10 exons, a start site, and a poly A+ tail. Thus, every such average 
molecule had to go through at least 18 splicing reactions (selec- 
tion of splice donor and acceptor sites) plus selections of the 
start site and the polyadenylation site. In total, a minimum of 
6M processing events had to occur to generate this diversity. This 
does not take into account (i) all subsequent base modification 
such as RNA editing, N6-methyladenosine, 5'-cap, (ii) subsequent 
cleavage events, or (iii) transportation of these RNA molecules 
to their sites of function or into well demarcated nuclear storage 
for later use. Nor does it account for the polyA— RNA popu- 
lation that exceeds that of the polyA+ by several folds. Also, if 
one were to include the ribosomal RNA, that represents ~95% 
of all cellular RNA (Raz etal, 2011), which is also processed 
and modified, then the minimal order of the number of cellu- 
lar processing events needed to accommodate the real complexity 
of RNAs within a single nucleus is likely to be in the tens of 
millions. 

As a vital step in transcript processing, RNA editing offers fur- 
ther insight into the high level of orchestration of nuclear RNA 
processing machineries. Adenosine deaminase acting on RNA 
(ADAR) mediates adenosine to inosine (A-to-I) RNA editing in 
dsRNA molecules, which often results in distinct downstream 
physiological outcomes for the edited RNAs. ADAR RNA edit- 
ing frequently targets coding regions of mRNAs that encode ion 
channels and other components of the synaptic release machin- 
ery (Hoopengardner etal., 2003; Seeburg and Hartner, 2003). 
Intronic non-coding sequences with extensive complementarity 
to upstream or downstream exons containing the adenosine des- 
tined to be edited can form simple exon-intron hairpin structures 
(Higuchi etal, 1993; Burns etal, 1997; Hanrahan etal, 2000; 
Wang etal., 2000) or more complex RNA secondary structures 
such as a pseudoknot (Reenan, 2005). RNA editing in mRNAs 
often generates protein products that are not encoded by the lit- 
eral genomic information, since upon translation the ribosomal 
machinery interprets inosines as guanosines (Basillo etal, 1962) 
resulting in amino acid substitutions. Various studies in different 
genetic model organisms suggest that RNA editing of mRNAs can 
result in profound changes in protein function (Rosenthal and 
Bezamlla, 2002; Bhalla et al, 2004; Ingleby et al, 2009). 

Execution of this type of modification requires great deal 
of precision from the RNA processing machinery in terms of 
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identification of RNA molecules to be edited, sites of editing 
within these molecules and also in the degree of editing at any 
given site. Editing could be separated into "pinpoint" and "pro- 
lific." The former one results in editing at specific sites in specific 
RNA molecules. In Drosophila for example, the nervous system 
editing sites generally demonstrate a high level of conservation 
across 12 fly genomes, representing 85 million years of evolu- 
tionary divergence (Hoopengardner etal., 2003). This high level 
of conservation includes sites that code for levels of transcript 
editing in the adult fly as low as a few percent, demonstrating 
physiological sensitivity for this form of transcript processing. In 
addition, some RNAs that form extensive dsRNA structures, such 
as non-coding transcripts, sense-antisense RNAs bound to each 
other, and exogenous RNAs can serve as ADAR substrates des- 
tined for prolific editing (Bass, 2002; Nishikura, 2006), resulting 
in up to 50% A-to-I conversions (Nishikura etal., 1991; Poison 
and Bass, 1994). Choice of such substrates is also controlled as 
not every RNA molecule that can form dsRNA will be edited and 
not every adenosine in molecules that are substrates for ADAR is 
edited. The fate of such inosine-rich RNA molecules is different 
from the ones subject to "pinpoint" editing. They can in fact have at 
least two fates: retention within the nuclear compartment through 
dependent localization by p54nrb/Vigilin (Zhang and Carmichael, 
200 1 ; Wang et al., 2005) and cytoplasmic degradation by Tudor- SN 
(Scadden, 2005). 

Furthermore, the ADAR information processing pathway is 
sensitive to environmental stimuli in addition to stress responses. 
Editing analysis of K+ channel mRNAs between Arctic and trop- 
ical octopus species revealed substantial differences in editing 
levels, which are mediated by temperature variations (Garrett and 
Rosenthal, 2012). In humans, the three ADAR genes can undergo 
alternative splicing to produce over a dozen isoforms with hetero- 
geneous RNA target specificities. The inflammatory cascade results 
in a dramatic induction of many of these ADAR isoforms, result- 
ing in a widespread increase of edited RNAs during mammalian 
inflammation (Yang et al., 2003a,b). Since intronic sequences form 
dsRNAs with coding regions to serve as ADAR substrates, editing 
must precede splicing. During these circumstances a regulatory 
mechanism must exist to ensure an accurate coordination of an 
extensive network of RNA processing machines to operate with 
high fidelity to generate dynamic responses upon internal and 
external stimuli. 

In addition to the plethora of transcript variation discussed 
above, the RNAs produced subsequently traffic into predeter- 
mined subcellular localizations. Many transcripts interact with 
sets of trafficking proteins to migrate to specific nuclear locations 
such as interchromatin granules (ICGs) or speckles (Spector and 
Lamond, 2011), for further processing in response to transient 
physiological signals. Transcripts can also undergo complex cleav- 
age events, followed by 5' capping, in response to little understood 
signals and circumstances (Affymetrix/CSHL ENCODE Project, 
2009; Mercer etal, 2010). CTN RNA represents an intrigu- 
ing example where both of these mechanisms are combined. 
Within minutes of amino acid deprivation or similar cellular 
stress, signals transduced into the nucleus result in cleavage of 
the sequestered CTN RNA, and the release and transport to the 
cytoplasmic translation machinery of the amino acid transporter 



for which the cleaved RNA product codes (Prasanth etal., 2005). 
In some cases, cleavage events themselves produce small RNAs, 
whose activities feedback into splicing decisions, as in the exam- 
ple of the HBII52 snoRNA, which is cleaved from intronic RNA 
templates in the SNURF-SNRPN locus, and interacts with the 
serotonin 2C mRNA to regulate its alternative splicing (Kishore 
and Stamm, 2006). 

Considering all of these regulatory layers together, millions of 
RNA processing events have to happen with accuracy and precision 
to generate the complexity of RNA present in a cell at any given 
moment. Many of these events require computation-like decision 
making as multiple alternative outcomes are available to a cell. In 
some cases, a single locus can produce hundreds, or even thou- 
sands of alternative products of RNA processing. The Drosophila 
Dscam locus for example, can produce 37,000 distinct isoforms 
from one "gene" (Wojtowicz et al., 2004). High throughput studies 
using RNA-seq revealed that 94% of human genes undergo alter- 
native splicing in some tissue (Wang etal., 2008). In light of this 
output volume, such widespread reliance on alternative splicing 
points to the magnitude of the regulatory challenge facing nuclear 
splicing machineries. Since the RNA signals that code for splicing 
events contain relatively low sequence complexity, and frequently 
diverge from consensus sequences (Egecioglu and Chanfreau, 
2011), they provide only modest energetic and informatic vectors 
to support the accuracy and reliability of high volume splicing 
output. As a result, achieving a correct splicing decision at a 
given site usually depends on a precise sequence of combinato- 
rial events, composed of multiple protein and RNA elements, and 
even chromatin adaptor systems (Luco et al., 201 1), acting both in 
competition (Witten and Ule, 2011), and in cooperation (Hertel, 
2008; Xiao and Lee, 2010). 

Performance related challenges would face any system designed 
to produce such a wide array of molecular outputs. Yet, even 
with these challenges, the systems performance of nuclear RNA 
processing appears to be surprisingly high. A recent investiga- 
tion of cell-to-cell variability of alternative splicing determined 
that non-transformed cells maintained very low splicing isoform 
variability between individual cells, and concluded that mam- 
malian cells minimize fluctuations in mRNA isoform ratios by 
tightly regulating the splicing machinery (Waks etal., 2011). Evi- 
dence increasingly supports precise and finely tuned regulation 
of transcriptome processing events as a rule in the nucleus. For 
example, a growing number of reports describe links between 
perturbations in splicing (Cooper etal, 2009; Ward and Cooper, 
2010), or transcript localization (Faghihi etal., 2008) and dis- 
eases. This trend underlines the importance of high precision and 
accuracy in RNA based machineries under healthy physiological 
conditions. 

Thus, using as yet little known organizational principles, 
nuclear RNA processing machineries not only produce processed 
and modified RNAs with high efficiency and accuracy, but imple- 
ment a large scale integration of dynamic physiological signals, 
which then drive precise regulatory control and plasticity in 
response to a myriad of signaling events. From this perspec- 
tive, the nuclear RNA processing machineries, and the dynamic 
structural environment surrounding them, must orchestrate their 
tasks with high accuracy and precision. They must recognize and 
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distinguish processing motifs in RNA structural signals with high 
sensitivity and yet reject sub-optimal motifs with a high selectiv- 
ity. Furthermore, the catalytic processes involved in the processing 
steps must occur with little or no errors. Once formed, the 
products must enter downstream trafficking pathways, and RNA 
whose presence is no longer required must be rapidly degraded 
to avoid introducing noise into earlier steps in the processing 
pathways due to RNA-waste accumulation. Finally, the entire 
multilayer system must maintain sufficient plasticity to quickly 
respond to thousands of potential information signals from out- 
side the nucleus to generate appropriate alterations in processing 
steps at determined loci in response to changing physiological 
signals. 

While the mystery of how all of this occurs within the space 
of the nucleus remains unresolved, it depends on the chore- 
ography of combinatorial interactions between transcripts and 
hundreds of RNA binding proteins (RBPs). RBPs interact with 
complex combinations of primary and secondary structure signals 
in RNA, and function in both cooperative and competitive types 
of architectures, as documented in splicing regulation (Darnell, 
2006; Sharma and Black, 2006; Ule and Darnell, 2007; Licat- 
alosi etal, 2008; Hallegger etal, 2010), and more recently in 
chromatin signaling (Tsai etal, 2010; Zhao etal., 2010). The 
first RBPs to interact with a given nascent transcript can influ- 
ence the subsequent folding steps of that RNA, and thereby 
change the downstream distribution of protein interactions for 
that RNA. The process of differential recognition of nascent 
RNA information signals by the correct RBP must occur at a 
pace complementary to that of transcription as well as subse- 
quent processing or chromatin signaling. To maintain plasticity 
for the accurate transduction of environmental signals, the RNA- 
protein interaction landscape must somehow achieve an extraor- 
dinary coupling between computational, catalytic, and structural 
elements. 

EMERGING FEATURES UNDERLYING THE HIGH 
PERFORMANCE OF NUCLEAR RNA PROCESSING 
MACHINERIES 

As investigations continue to reveal the depth and performance 
of nuclear RNA processing functions, the challenge for systems 
biologists grows more daunting. Current systems biology model- 
ing cannot account for the precision, accuracy, or signal to noise 
ratios achieved by RNA processing machineries. Nevertheless, the 
transcriptome-proteome interface in the nucleus contains a vast 
store of dynamical information. To consistently make effective 
use of this information, the nuclear systems network architecture 
must have a number of key design features, including maintenance 
of reversibility, temporal coherence (the timing and velocity of 
information processing between network layers), and the ability 
to resolve logical conflicts over the spatial extent of the networks 
that comprise the system. To help explain how nuclear RNA pro- 
cessing networks harness the power of that information, a number 
of concepts have emerged. 

DYNAMIC SCAFFOLDING MAXIMIZES INFORMATION FLOW 

Biological molecules in the nuclear space exist in a constrained 
environment where diffusion occurs relatively slowly (Albert 



etal., 2012). Thus, biochemical kinetics can represent hurdle for 
adequate performance of complex multistep processing pathways, 
as the entire interdependent system must minimize bottlenecks 
and flow imbalances. In order to overcome these physical limita- 
tions, RNA processing machineries must rely on highly articulated 
spatial domains, where local environments transduce information 
efficiently. The concept of global scaffolding can create these per- 
formance enhancing interaction topologies. For example, an RNA 
scaffold can increase the local concentration of an RBP, such as 
Nova 1, and a corresponding increase in the signal to noise per- 
formance of Nova's influence on splice site selection within that 
particular spatial domain (Figure 1 A) . Increasing the local concen- 
tration of these factors also permits improved interaction kinetics 
with less AG, making the interactions more reversible (Figure IB 
and also below). 

Structural features in the nucleus that enhance transcriptional 
control and RNA processing contain a large amount of spatially 
and temporally coded information content. The nucleolus for 
example appears to depend on RNA secondary structure signals 
for its effective formation, as the absence of these RNA secondary 
structures resulted in complete disarray of the nucleolus (Peng 
and Karpen, 2007). The reversibility of such events in nuclear 
architecture means that elements responsible for their formation 
also encode sufficient information to detect external signals and 
respond with disassembly and transport of components to other 
spatial domains or downstream processing pathways (Spector and 
Lamond, 2011). 

COMPETITION AND COMPUTATION AT THE 
TRANSCRIPTOME-PROTEOME INTERFACE 

With their unique combination of primary, secondary, and ter- 
tiary structure, RNA offers a multiplicity of ways to code for 
biological information. At the core of this system is a language 
of RNA-protein, and RNA-RNA/DNA recognition implemented 
by RNAs unique ability to couple analog and digital signals (St. 
Laurent and Wahlestedt, 2007). The efficient transduction of that 
information often depends on its timely recognition by the appro- 
priate RBPs present in the immediate vicinity of an elongating 
primary transcript. As the transcript emerges from Pol II, it begins 
to fold. That folding is also influenced by the RBPs that are sup- 
posed to interact with it. They influence which of many folding 
paths that the RNA can take. If the correct RBPs are not right 
there to quickly associate with the RNA, then the RNA could 
take another folding path, which would in turn lead to a dif- 
ferent set of downstream events, as in the case of Nova splicing 
proteins and their influence on upstream or downstream splice 
site choice. So the presence or absence of a given distribution of 
RBPs in the vicinity of a nascent RNA chain will influence a series 
of "memory states" that then modulate other processing events 
downstream. With such a large space of potential RNA-protein 
interactions, and the requirement for dynamic reversibility of 
many of their associated signaling events, the system faces a major 
challenge to achieve an adequate signal to noise ratio for effective 
function. 

Active competition for recognition site on nascent RNA signals 
directly addresses these problems. Splicing regulation makes abun- 
dant use of competing RBPs to enhance the sensitivity, specificity, 
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FIGURE 1 | (A) Information currently available about the regions of 
dark matter transcription and the actual RNA molecules made from 
these region comes from various types of experiments and databases. 
There is relatively little overlap between these different databases 
suggesting that the actual extend of dark matter transcription is far 
greater than any one database suggests. (B) A theoretical curve showing 



expected results of the fraction of the genome that is transcribed as a 
function of the number of biological sources whose RNA is profiled. 
The coverage of transcribed genome by protein coding genes including 
their introns is 42% and lincRNAs bring it up to 58%. However, the full 
extent of the transcribed genome is expected to be much greater 
than that. 



and regulatory control of splice site decision commitment ( Ule and 
Darnell, 2006; Chen and Manley, 2009). Examples include PTB 
protein which antagonizes Nova (Polydorides et al, 2000) at over- 
lapping recognition sites, establishing a sensitive switch between 
two splicing choices. Interesting examples from spliceosome qual- 
ity control also demonstrate the importance of reversibility, such 
as involvement of ATPase Prpl6p in both forward and discard 
splicing pathways (Koodathingal et al, 2010). 

Competition may also drive accurate computation in the small 
RNA regulatory pathways, with duplex regions competing for 
recognition by ADAR vs Drosha/Dicer, with contrasting outcomes 
depending on which protein prevails (Nishikura, 2006). Similarly, 
many epigenetic signaling events may be mediated by competitive 
interactions between IncRNAs and protein components of signal- 
ing machineries (Lee, 2011). All of these regulatory mechanisms 
require effective concentrations of interacting proteins to achieve 
adequate signal to noise ratios. The Lin28-let7 miRNA interaction 
provides an interesting example of specificity that would be diffi- 
cult to achieve with low protein concentrations (Nam et al, 201 1; 
Piskounova etal., 2011). 

REVERSIBILITY AND FEEDBACK LOOPS 

Erasure of information presents a challenge for any complex 
system (Lloyd, 2001). In biological systems, thermodynamic con- 
straints make the cost of information innately high, and yet its 
value can oscillate from vital to worthless or even harmful in sec- 
onds once the message or a signal encoded in it is transduced. 
The dynamics of this "volatile market" reality make erasure of bio- 
logical information a high priority in any system, but especially 
in the nucleus where many network pathways converge. While 
DNA retains the permanent information, a large majority of the 
dynamical information exists within the transcriptome, as com- 
binatorial accumulations of RNA-protein and RNA-RNA/DNA 
interactions. 



Not surprisingly, reversibility is a key feature of information 
coding at the transcriptome-proteome interface. The conforma- 
tional flexibility of RNAs, especially ncRNAs whose secondary 
structures are not constrained by coding regions, and the dynamic 
changes in their structure that can occur in response to protein 
binding and environmental signals provide not only increased 
symbolic information density, but contribute to the reversibil- 
ity of RNA-protein interactions. Proteins that bind RNA also 
tend to contain natively unstructured regions. This could be 
the basis for structural articulation (i.e., the incorporation of 
information containing motifs and elements into nuclear scaf- 
folding structures) that improves precise temporal and spatial 
choreography of RNA processing machineries. For example, inter- 
actions between RNA and their cognate proteins often involve 
natively unstructured regions in the protein, and similarly flex- 
ible structures in the RNA (Leulliot and Varani, 2001). These 
regions of evolutionarily coded local disorder contribute use- 
ful properties for information processing. Precisely orienting 
them within an articulated regional structure increases their 
sensitivity, specificity, and reversibility, thereby contributing 
directly to the throughput and precision of nuclear machiner- 
ies. When these regions form a stepwise interaction with their 
RNA target, the entropy of the complex is decreased, thereby pro- 
ducing an "entropic spring" effect, which enhances reversibility 
when the interaction is no longer required (Tompa and Cser- 
mely, 2004). Together with reversibility of individual interactions 
within RNA-protein interaction networks, frequent feedback 
loops support the reversibility of these networks. These fea- 
tures operate cooperatively to facilitate the timely erasure of 
information, and the finely tuned response of RNA processing 
machineries to changes in signaling and various environmental 
conditions. 

Thus, a central part of our argument maintains that the perfor- 
mance and throughput of nuclear RNA processing machineries 
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requires the functional coupling of well-articulated spatial and 
temporal landscapes in order to maximize the flow of biolog- 
ical information through the components of RNA processing 
networks. 

THE DARK MATTER INTELLIGENT SCAFFOLD 
IMPLICATIONS OF THE PREPONDERANCE OF CELLULAR DARK 
MATTER RNA IN MAMMALIAN CELLS 

Several years ago, John Mattick presented the concept of the 
nucleus as an "RNA machine" (Amaral etal., 2008), arguing that 
much of the information processing in the nucleus occurs through 
RNA intermediates, and that ncRNA overcomes the prohibitive 
regulatory overhead associated with saturated protein-protein 
regulatory interactions in this environment (Gagen and Mattick, 
2005). From the point of view of information theory, this implies 
that the RNA content of the nucleus functions in a manner 
roughly equivalent to an information channel, and that the "chan- 
nel capacity" (information throughput) of this system depends 
on the available degrees of freedom of the combined popula- 
tion of RNA molecules contained in the system. Consequently, 
RNA quality control and degradation machinery must actively 
pursue the elimination of non-functional RNAs that would rep- 
resent noise to the "RNA machine." Yet, with the recent discovery 
that dark matter RNA makes up the majority of cellular RNA by 
mass (Kapranov etal., 2010; and an even greater majority in the 
nucleus), it appears that this enigmatic class of RNA does not rep- 
resent noise in the RNA based information channel of the nucleus, 
and instead likely comprises an integral part of the information 
channel itself. 

The information containing structural features of dark mat- 
ter RNAs and their ability to interact with the nuclear proteome 
appear similar to their coding counterparts. If the primary 
sequence patterns and secondary structure motifs that determine 
protein interactions occur with similar densities in both classes 
of RNAs, then they must both exist in the nucleus in complex 
with proteins. If dark matter RNAs represented noise or spurious 
transcription, their predominant mass would compromise signal 
to noise ratio performance of RNA processing machineries in the 
entire nucleus, as they depend on the information derived from 
such interactions. Instead, its high concentration suggests that 
dark matter RNA functions at the core of the multilayer nuclear 
"RNA machine" (Amaral et al, 2008). 

DARK MATTER RNA ESTABLISHES A DYNAMIC AND REVERSIBLE 
MICRO-PARTITIONING OF NUCLEAR SPACE 

The large amount of dark matter RNA in the nucleus, establishes 
the basis for the "intelligent scaffold" concept. Each dark matter 
RNA acts either in cis or in trans, depending on its own informa- 
tion content (complex combinations of primary sequence motifs 
and secondary structures) , and the proteins with which it interacts. 
Long dark matter RNAs can form several types of interactions with 
DNA, and other RNAs, inside spatial domains of chromatin. These 
can involve direct interactions between RNA and DNA, similar to 
what occurs between pRNAs transcribed from regions in between 
rRNA genes, and the TO element in rRNA promoters (Mayer et al., 
2006; Schmitz et al., 2010). Proteins can also mediate the interac- 
tions, as recently demonstrated for XIST and transcription factor 



YY1 (Jeon and Lee, 2011). Alternatively, proteins or RNAs can 
use co-transcriptional targeting where the transcript is tethered 
during transcription by RNA polymerase, similar to the mech- 
anism of the TAR RNA targeting by HIV TAT protein (Brady 
and Kashanchi, 2005). Transcriptional targeting may also occur 
with the short ncRNAs transcribed from the 5' ends of many 
human genes (Wei etal., 1998; Kapranov etal, 2007a; Kanhere 
and Jenner, 2012). Dark matter RNAs have all three of these 
mechanisms available to mediate their interactions with DNA and 
other RNAs, providing a large combinatorial basis for the forma- 
tion of flexible complexes that drive spatial and computational 
integration. 

As these RNAs accumulate into spatial micro-domains sur- 
rounding one or more genomic loci, they establish a region of 
nuclear space under their influence, which in turn attracts a vari- 
ety of molecules. The RNAs can interact with many proteins, and 
other large and small RNAs, often with relatively low affinities, 
which results in a temporally and spatially distributed macro- 
molecular landscape around that locus (see Figure 2). Since these 
molecules function primarily in transcriptome processing and 
epigenetic regulation, the dark matter guided landscapes would 
facilitate the structural and computational operations of both sys- 
tems, as well as catalyze crosstalk between them. In this manner 
dark matter RNAs can effectively establish finely tuned concen- 
tration gradients of epigenetic signaling and RNA processing 
proteins (and small RNAs) for efficient operation of these sys- 
tems. An intriguing example of this has recently been described as 
a "molecular cage" for PRC1 complexes. The "molecular cage" 
apparently uses a combination of methylated H3K27 moieties 
and low affinity binding sites on nascent IncRNAs to increase the 
local concentration of PRC1 for chromatin signaling (Beisel and 
Paro, 2011). 

The intelligent scaffold mechanism facilitates the accumulation 
of higher concentrations of RBPs (and small RNAs) within chro- 
matin regions, as well as the micro-partitioning of these regions 
at an optional resolution for RNA processing, epigenetic signal- 
ing, and transcript expression regulation. Macromolecules within 
these micro-domains can disassociate from their low affinity bind- 
ing sites in these dark matter rich micro-regions, as they find higher 
affinity sites in nascent strands emerging from RNA Pol II tran- 
scription. Abundant sites of alternative localization in dark matter 
equates with more effective differential recognition of RNA motifs 
by competing RBPs, and increased reversibility of signal trans- 
duction in regulatory events. The key here is that signal to noise 
ratio is not driven only by the size of AG, but by the ratio of AG 
"protein A" to AG "protein B" or the ratio of AG sitel of protein 
A on the "target" nascent strand RNA molecule to AG site2 of the 
same protein A on the "repository" dark matter RNA molecule. 
This is shown as "Biological Information Content" on Figure 2. If 
both AGs are large compared to their difference, then the signal 
is low and the noise is high. A recent experiment that used RNAi 
knockdown to reduce the expression levels of splicing regulator 
SRSF1 confirmed the importance of high concentrations of RNA 
processing proteins to maintain adequate signal to noise ratios. 
Lowered concentrations of SRSF1 markedly increased the vari- 
ance of splicing isoform ratios of the target transcript, measured 
in populations of single cells (Waks etal, 2011). 



Frontiers in Genetics | Non-Coding RNA 



April 2012 | Volume 3 | Article 57 | 6 



St. Laurent etal. 



Dark matter RNA 




FIGURE 2 | The information content of a hypothetical dark matter RNA. 

Combinations of primary sequence and secondary structure form high affinity 
interaction sites with high information content (left side). These specific 
interactions have a large AG. At the other end of the spectrum, the same 



RNA can have a large number of relatively non-specific interaction sites that 
nevertheless have biological information content and functional significance. 
Their absence would result in subtle loss of signal to noise characteristics 
across many affected pathways that could be unrelated to the RNA. 



Adjacent micro-partitions could favor higher concentrations 
of some proteins over others, due to the heterogeneous distribu- 
tions of low affinity binding sites along the lengths of dark matter 
RNA molecules in each micro-partition. The result, depicted in 
Figure 3, shows varying levels of sequestration of RNA processing 
components, depending on the systems performance require- 
ments of each component. Overall, higher concentrations of 
effector components equate with faster kinetics and more finely 
tunable regulation, which in turn improve signal to noise ratios 
and system performance. 

The temporal and spatial dynamics of intelligent scaffolds 
permit integration of signals from many levels of biological 
information processing. Changes in the intelligent scaffolding 
environment of a three-dimensional chromatin micro-region can 
impact the dynamics of transcriptional folding, processing, local- 
ization, and degradation of transcripts as well as chromatin 
signaling (see Figure 4). For example, dark matter cleavage events 
can quickly change the structure of the micro-domain by sweep- 
ing away large numbers of proteins, RNAs, and scaffold, and at 
the same time generate small RNAs, or expose regions of RNA 
complementarity to small RNAs, as described in the recent theory 
of competing endogenous RNAs (ceRNAs) by the Pandolfi group 
(Salmena etal., 2012). Cleavage of very long dark matter RNAs, 
for example those coming from the vlinc regions (Kapranov et al., 
2010), could occur even with their RBPs still attached. Cleaved 
RNAs could then function as IncRNAs. Small RNAs could also 
interact with sites in tethered vlincs, thereby acting as a sink, or by 



blocking sites that would otherwise be occupied by other signal- 
ing molecules. Under some circumstances, combinations of these 
other events could serve as the signal to trigger cleavage of the 
vlincs, which could then form a rapid feed-forward circuit as the 
cascade of cleavage continues in the entire micro-domain. 

CONCLUSION: THE FOREST ENRICHES THE FUNCTIONALITY 
OF THE TREES 

While specific interactions drive the bulk of molecular infor- 
mation processing in biological systems, in the RNA based 
regulatory networks of the nucleus the performance character- 
istics of specific interactions are determined by the surrounding 
micro-environment. The dark matter RNA plays a key role in 
implementing the dynamic responsiveness of that surrounding 
micro-environment. Considering its importance, the concept of 
functions for dark matter RNAs should embrace a continuum, 
from those that arise from highly specific interactions, to those at 
the other end of the spectrum that involve lower affinity and less 
specificity, but nevertheless contribute to the synergistic attributes 
of the surrounding micro-environment. Those attributes per- 
mit the specific interactions, and facilitate their coordination and 
integration. 

Evaluating dark matter RNAs in this fashion provides a con- 
text and explanation for the relatively low level of conservation of 
these RNAs, as many informational elements either do not require 
conservation, or require only functional conservation. As demon- 
strated for a growing list of ncRNAs, functionality does not require 
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FIGURE 3 | Distribution of different domains with varying AG affinities 
(Y-axis) for different RBPs (balls of different colors) along the 
length of a hypothetical series of micro-domains each composed of 



different (very) long non-coding RNAs (X-axis). The heterogeneous 
landscape of dark matter RNAs creates pockets of enrichment of these 
different RBPs. 



Nuclease 




FIGURE 4 | A nascent RNA strand being synthesized by RNA Pol II 
interacts with RBPs (balls of various colors), small RNAs and lincRNAs. 

Interaction between all these molecules is made possible by close proximity 
of these molecules in the nuclear micro-domains. RBPs and small RNAs 
are bound to (v)lincRNAs with relatively low specificity and the latter 
present them in exact architectural and temporal environment to the 



nascent strand that possess specific motifs for the former. Nuclease 
action cleaves the non-coding RNA template and thus changes the 
structure of the scaffold complex. This in turn can change the kinetics 
of the interaction between the RBPs and the small RNAs bound to it 
and the sequence of their presentation and interaction with the nascent 
strand of RNA. 



conservation, at least not in the same way that is known to occur for 
protein-coding sequences (Pang etal, 2006). The theory predicts 
increasing concentrations of dark matter complexed with RNA 
interacting proteins in complex organisms, and helps explain the 
direct correlation of organismal complexity with the genomic per- 
centage of non-coding regions in all genomes sequenced to date 
(Taft etal, 2007). It also suggests expansion of regions of RNA 
interacting regions in proteomes of organisms as evolutionary 
complexity increases. 

The dark matter intelligent scaffold concept focuses on the level 
of coupling between computation and spatial articulation. The 
theory holds that large increases in biological complexity required 



ever increasing levels of coupling between computation and struc- 
ture, as a key driver of that complexity, and ultimately a measure 
of organismal fitness. Dark matter RNA was recruited to perform 
this function, to dynamically bridge these two ostensibly orthogo- 
nal dimensions, because its flexible structural and computation 
features endow it with special qualities to serve as a molecu- 
lar intermediate in the coding, processing, and distribution of 
information. 
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